{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from qdrant_haystack import QdrantDocumentStore\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Qdrant Vector Store\n",
    "\n",
    "This is an example of using the [Qdrant](https://qdrant.tech/) vector store with fastRAG. This is done using the dependency `qdrant_haystack` and `qdrant_client` python connector. We assume you have a running server, e.g. by calling `docker run -p 6333:6333 qdrant/qdrant` locally. \n",
    "\n",
    "Two important settings are the dimension of the vectors and HNSW parameters. Qdrant uses HNSW index for faster search, with a tradeoff between accuracy and latency. In general, higher numbers mean better accuracy, lower latency and larger RAM usage. \n",
    "\n",
    "The parameters are specified when connecting the server and creating a new index; these cannot be changed after the index was created. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "dim = 100\n",
    "index_name = \"test_hnsw\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating an Index\n",
    "\n",
    "Need to specify the location of the Qdrant service, vector dimension, index name, similarity metric and optionally the HNSW configuration. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "q = QdrantDocumentStore(url=\"http://localhost:6333\",\n",
    "                        embedding_dim=dim,\n",
    "                        timeout=60,\n",
    "                        index=index_name,\n",
    "                        embedding_field=\"embedding\",\n",
    "                        hnsw_config={\"m\": 128, \"ef_construct\": 100},\n",
    "                        similarity='dot_product',\n",
    "                        recreate_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Insertion and Searching of Documents\n",
    "\n",
    "We'll create a few documents; they must have an `id`, `content` and `embedding` keys but could contain more data such as text titles. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "docs = [{\"id\": 1, \"content\": \"I like to go to the beach\", \"embedding\": np.ones(dim)},\n",
    "        {\"id\": 2, \"content\": \"Where is my hat?\", \"embedding\": np.ones(dim) * 2},\n",
    "        {\"id\": 3, \"content\": \"GPT4 is very nice\", \"embedding\": np.ones(dim) * 3},]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Writing the documents to index with batching; deduplication of documents is on by default. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "500it [00:00, 598.14it/s]                                                                                                                                                                                                        \n"
     ]
    }
   ],
   "source": [
    "q.write_documents(docs, index_name, batch_size=500)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.get_document_count(index=index_name)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Query by embedding\n",
    "Need to provide a vector and `top_k` value. In general can also query by text search which we won't show here. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<Document: {'content': 'GPT4 is very nice', 'content_type': 'text', 'score': 0.9525741268224334, 'meta': {}, 'id_hash_keys': ['content'], 'embedding': '<embedding of shape (100,)>', 'id': '3'}>]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "q.query_by_embedding(np.ones(dim), top_k=1, index=index_name)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "fastrag",
   "language": "python",
   "name": "fastrag"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
